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Abstract: A simple Baycsian approach to nonparametric regression is de- 
scribed using fuzzy sets and membership functions. Membership functions are 
interpreted as likelihood functions for the unknown regression function, so 
that with the help of a reference prior they can be transformed to prior den- 
sity functions. The unknown regression function is decomposed into wavelets 
and a hierarchical Baycsian approach is employed for making inferences on the 
resulting wavelet coefficients. 
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1. Introduction 

Consider the model 

(1.1) y% = g(xi) + £j, i = 1, . ..,n, and Xi G T, 

where e = (si, £2, • • ■ , £ n )' ~ N(0, a 2 I), a 2 is unknown and <?(•) is a function defined 
on some index set T C 1Z 1 . Inferences about g such as its estimation and estimation 
error as well as model checking are of interest. 

Without parametric assumptions such as those leading to linear regression, this is 
a nonparametric regression problem. A Baycsian approach to (fully) nonparametric 
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regression problems typically requires specifying prior distributions on function 
spaces which is rather difficult to handle. The extent of the complexity of this 
approach can be gauged from sources such as Ghosh and Ramamoorthi [9], Lenk 
[10], and so on. Furthermore, quantifying useful prior information such as 11 g is 
close to (a specified function) go" is difficult probabilistically, whereas this seems 
quite straightforward if instead an appropriate metric on the concerned function 
space is used. This is where fuzzy sets or membership functions can be made use 
of. Before proceeding to this problem, let us recall the following details on fuzzy 
sets and membership functions. 

Definition 1.1. A fuzzy subset A of a space Q (or just a fuzzy set ^4) is defined 
by a membership function h a ■ Q — > [0, 1]. 

The membership function, h^g), is supposed to express the degree of compati- 
bility of g with A. For example, if Q is the real line and A is the set of points "close 
to 0", then /ia(0) = 1 indicates that is certainly included in A, but /i^(-05) = .01 
says that .05 is not really "close" to in this context. Similarly, if Q is a set of 
functions and A C Q is a set of functions "close" to a given function go, then 
/ia(<7o) = 1 indicates that go is certainly included in A; however, if = -01 

with gi(x) = Wgo(x) + 100 then g\ is not really "close" to go in this case. 

Note that even when Q = <d is the parameter space, a membership function Ka{6) 
is not a probability density or mass function defined on O, and hence cannot be used 
to obtain a prior distribution directly. Instead, as we have done in [4] , we propose 
that a reasonable interpretation for a fuzzy subset A of is that it is a likelihood 
function for 9 given A. (See also [12, 13]). This interpretation seems to be able to 
answer some of the questions regarding how appropriate the concept of juzziness is 
in modeling our perception of imprecision. French [7] discusses a few of these ques- 
tions. First of all, likelihood is an accepted means for modeling imprecision. Another 
important question is how to define tiAnB from fiA and hs for incorporating Ka and 
Kb in Bayesian inference. If A and B are independent, then interpreting \ia and 
hs as likelihood functions leads to the result that IiAnB = hAhs, for this purpose. 
Further, the qualitative ordering that underlies a membership function can also be 
investigated with this interpretation, in conjunction with a prior distribution, as we 
study later. See [4] for an application to hierarchical and robust Bayes inference. 

This paper is closely related to Angers and Delampady [3, 4]. In the latter, we 
used fuzzy sets to help specify a prior density on a finite parameter space, whereas 
in the former a nonparametric function estimator using wavelet decomposition was 
presented. It is therefore natural to explore whether a combination of these two 
ideas can be fruitfully employed. Towards this end, we adopt the semi-parametric 
approach of using the wavelet decomposition of a regression function along with 
a remainder, thus effecting a drastic reduction of the dimension of the parameter 
space. Next we propose to incorporate the imprecise prior information of the kind 
"the true regression function g is close to go" using membership functions. Wavelet 
decomposition of go provides the wavelet coefficients 6o which we take as the prior 
mean of the wavelet coefficients 9 corresponding to g. Precision of this prior mean 
estimate is unclear, so we treat the prior variance of these coefficients as a hyper- 
parameter with a second stage (reference) prior. This approach is also tied to the 
question of Bayesian robustness. If a membership function stating "we think g is 
close to go" is the only prior input that we intend to incorporate, how should we 
then proceed with the Bayesian inference related to gl We claim that the only 
natural approach is to study the robustness of the inferences resulting from the 
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class of prior densities compatible with this membership function (see Section 5.1). 
This approach is further explained in the following sections. 

This paper is organized as follows. In Sections 2 and 3, a brief summary of our 
previous work [3, 4] is presented. In Section 4, details on the posterior calculations 
are given. The main focus of this paper, namely, model checking using Bayes factors 
and fuzzy models, is discussed in Section 5. Simulations and practical examples are 
presented in the last section to illustrate this theme. 

2. Nonparametric regression and wavelets 

To proceed further, we assume that the regression function g and its prior guess go 
are in £2 and impose a wavelet structure on both of them. To do this, we consider 
a compactly supported wavelet function i\> £ C s , the set of real- valued functions 
with continuous derivatives up to order s. Thus, g{x) can be written as 

00 

9{X)= Ol k ^k(x)+Yl E Po,k^j,k{x) 

\k\<K j=Q\k\<Kj 



gj(x) + Rj(x), 



where 



9j(v)= E a k (j) k (x) +Y ^2 Pj,ki>j,k( x )i 

\k\<K j=0 |fc|<Kj 

00 

(2.1) Rj(x)= J2 E Mj,hl*), 

j = J+l \k\<Kj 

4>k(x) = (f){x - k), 

a k = 4>{x — k)g{x)dx and 



/3 j>k = / 2 j / 2 ip(2 j x - k)g{x)dx. 



T 

Here Kj is such that 4>k(x) and ipj lk (x) vanish on T whenever |fc| > Kj, and cf> is 
the scaling function ("father wavelet") corresponding to the "mother wavelet" ip. 
Such Kj 's exist (and are finite) since the wavelet function that we have chosen has 
compact support. (See [3] or [8] for details.) Since the number of observations is 
finite, only a finite number of parameters can be estimated. Therefore, as suggested 
in [3], the resolution level J should be chosen such that the total number of un- 
known parameters which need to be estimated is no larger than n, the number of 
observations. Since the total number of a and j3 parameters is bounded by 

l x 2 J+1 + + 1) + {14, + + 2) 

where lx , and 1$ represent the length of the support of T, ip(-) and (/>(■) respec- 
tively, we require this upper bound to be less than n. The optimal resolution level J, 
however, should be chosen using a Baycsian model selection technique, for example 
[.">]. Consequently, we propose that the function gj(-) be estimated from the data 
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and the function Rj(-) be considered as a "nuisance parameter" to be eliminated 
by integrating it out. 
Further we assume that 

oc 

9o 0*0 = ^2 a o,k(f>k(x) X] Po,j,k^j,k( x )> 

\k\<K 3=0 \k\<Kj 

where 

«o,fc = J 4>{x - k)g (x)dx, 

0o,j,k = J y' 2 ^{Vx - k)g (x)dx. 

Since go is given, the wavelet coefficients can be assumed known. 

From [1], it is known that j3j, k — 0(2 _:,s ) where s denotes the degree of "smooth- 
ness" of ip, that is tp(-) £ C s (s > 1/2). Further, some of the a priori information 
wavelet coefficients can be translated into 

(2.2) E[a k ] = a , k ; E[j3 j>k ] = /3 0j , fc ; 

(2.3) VarK) = r 2 ; Var(/3,-, fe ) = r 2 /2 2 ^; 

(see [3] for details). Here, r 2 is a first stage hyper-parameter, a suitable prior on 
which will be specified later. (It is also supposed that E[/3j jk ] = /3o,j,k = for 
j > J + 1.) However, the prior probability distribution on the coefficients a k and 
(3j^k itself will be specified by a membership function to be introduced later. 

Since there are only a finite number of j3j jk that can be estimated, we cannot 
expect to have a very informative prior distribution on the remainder term, Rj. 
However, to be able to proceed with the Bayesian approach, we need to assume that 
Rj(-) is a stochastic process. For simplicity and computational ease, we consider 
a Gaussian process with zero mean (compatible with E[(3j^} = fto,j,k = for 
j > J + 1). Specifically, following [3], we now assume that Rj(x) is a Gaussian 
process with mean function and covariance kernel t 2 Q(x, y) where 

(2-4) Q(x,y)= vfs E MWiAv)- 

j = J+l " \k\<Kj 

Note that a prior assumption such as (3j, k being independent normal random quanti- 
ties will naturally lead to this prior distribution as can be seen from the Karhuncn- 
Loeve expansion [2]. We, however, make the stronger assumption that the remainder 
Rj(-) itself is a zero-mean Gaussian but with an unknown prior variance component 
t 2 . Our reason for this assumption is that, once the optimal resolution level J is 
chosen using a powerful mechanism such as a Bayes factor, as we suggest later, the 
wavelet coefficients at higher resolutions are not expected to have substantial influ- 
ence on the wavelet smoother. Hence Rj(-) which is made up of these higher-order 
wavelet coefficients will also have negligible influence. 

3. Prior information and membership functions 

We have explained in the previous section that we would like to make use of im- 
precise prior information such as " g is close to go" by using a membership function 
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which translates this into a measure of distance between the corresponding wavelet 
coefficients. Let us examine the implications of assuming that the available prior 
information is quantified in terms of a membership function 



(3-1) h A (jg)=Z(j>(g,go)), 

where p is a measure of distance. Due to the wavelet decomposition assumed on g 
as well as go (see Section 2), a natural choice for p is the £2 distance given by 

(3-2) P 2 {g,g ) = \\g - g a \\ 2 = - O^) 2 , 

where and 6® k are, respectively, the wavelet coefficients of g and go. Using 
Parseval's identity, note that 

00 

P 2 (.9,.9o) = ||s - 9of = Y ( ak ~ a °< fe ) 2 + Y Y (&- k ~ ^oj.fc) 2 • 

\k\<K 3=0 \k\<Kj 

Since we cannot estimate (3j t h for j = J + 1, . . . , and because flj^ = 0(2^^) (see 
[1]) we then have 

(3-3) 

j 

P 2 (.9,5o)= Y ( a k ~ ao,kf + Y Y (Pj,k ~ Po,j,k) 2 

\k\<K„ j=0 \k\<Kj 



00 



+ Y Y (&,fc-Aw0 2 

j=J+l \k\<Kj 
J 00 

= y (^-a Q , k ) 2 +Y Y Y Y °( 2 " 2JS ) 

\k\<K a 3=0 \k\<Kj j=J+l\k\<K j 

= Y K-«o, fe ) 2 +E Y - /W + 0(2- 

|fc|<-Ko 3=0 |fc|<ifj 

(3.4) =p 2 7 (.g,.g ) + O(2- 2(J+1)s ). 

For this reason, to quantify the imprecise prior information, wc will use a member- 
ship function that will depend only on p 2 (g,go)- Some possibilities for h A are the 
following: 

(i) The Gaussian membership function given by 

(3.5) h A (g) =exp(Vj(.9,.9o)) = exp(-a||# - 0°|| 2 ). 

This membership function can be explained as follows. Suppose we have available 
some past data of the form 

y*=9(x*) + £i, i = l,...,n*, 

with £j denoting i.i.d. normal errors, and suppose g is estimated from this data by g. 
Then the information in this data may be quantified using a membership function 
of the type 

Ms) = exp(-Q|| ff -.g|| 2 ) = exp(-c^(^- - 6 3 ) 2 ). 
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go may then be identified with g. If we have multiple past data sets, we may then 
have available h Al (g) = exp(— a±\\g — <h|| 2 ), hA 2 (g) = exp(— Oi2\\g — 32II 2 ), and so 
on, which may be combined into 

h A {g) = h AinA2 (g) = h Al (g)h A . 2 (g) = exp(- {ai | \g - g x | | 2 + a 2 \\g - h\\ 2 })- 

As an example one could consider fitting regression lines to two (or more) sets of 
past data with possibly different error variances and use the fitted regression lines 
along with the estimated variances for constructing the membership functions. This 
justifies to some extent our previous suggestion that membership functions quantify 
prior information in the sense of the likelihood. The constants a\ and 012 provide 
additional scope for assigning different weights to the two sources of information, 
which is another appealing feature of this approach. 

(ii) The multivariate t membership function 

(3.6) h A (g) = (1 + p 2 r (g, 9a )y iP+q)/2 = (l + (* - ^)'V~\e 0«)/ q y [p+q)/2 , 

where q > 2 is the degrees of freedom and p denotes the dimension of 6. This is a 
continuous scale mixture of Gaussian membership functions with the same go for 
each of the membership functions. Since this vanishes more slowly than (3.5), one 
could expect better robustness with this. 

(iii) The uniform function 



MflO 



1, tfpj(g,go)<8\ 

0, otherwise. 



This is an extreme case where g is restricted to a neighborhood of go- 

In order to proceed with Bayesian inference on g, we need to convert the mem- 
bership function into a prior density. This is done as in [4] with the aid of a reference 
prior density ttq- Thus we obtain the prior density 

ir(g) (x MsKoCff), 

or, upon utilizing the wavelet decomposition for <?, we have an equivalent prior 
density 

TT(9,a 2 ) oc M0)7r o (0, ° 2 )- 



4. Posterior calculations 



As in [:>], let Q n = (Q n )ii, 1 < i,l < n, where (Q n )il = Q(xi,xi), which was 
introduced in (2.4). Note that 

(Qn)il = 2- 2js ij jk {x i y 3k (x l ). 
j>J+l \k\<Kj 

Let X = ($', S') with the ith row of $' being {4>k{xi)}\ k \ <Ko and the ith row of S' 
being {ipjk(xi)}'\ k \ <K . fi<J<J . Then we have the model 

(4.1) y\0,a 2 ,T 2 ~ N(X9,a 2 I n +T 2 Q n ). 

Unless 9 has a normal prior distribution or a hierarchical prior with a conditionally 
normal prior distribution, analytical simplifications in the computation of posterior 
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quantities are not expected. For such cases, we have the joint posterior density of 
the wavelet coefficients 9 and the error variances a 2 and r 2 given by the expression 

n(9,a 2 ,T 2 \y) oc f(y\0, a 2 , T 2 )h A {9)n Q {9, a 2 , t 2 ), 

where / is the likelihood. From (4.1), / can be expressed as 

f(y\6,a 2 ,T 2 ) oc \a 2 I n + r 2 Q n \- 1 ' 2 

x exp(-i{(y - X9)\o 2 I n + r 2 Q n )-\y - X6)}). 

Proceeding further, suppose ttq of the form 

(4.2) 1T (8,<T 2 ,T 2 )=TT 1 (<J 2 ,T 2 ), 

which is constant in 9, is chosen. 

MCMC based approaches to posterior computations are now readily available. 
For example, Gibbs sampling is straightforward. Note that the conditional posterior 
densities are given by 

(4.3) 7r(0|y, a 2 ,r 2 ) cx exp(-i{(y - X9)'(a 2 I n + T 2 Q n )-\y - X9)})h A (9), 
(4.4) 

n(a 2 \y,9,T 2 ) oc \a 2 I n + ^Q^ 1 ' 2 

exp(-i{(y - X9)'(a 2 I n + r 2 Q„)- 1 (y - X9)})n 1 (a 2 , r 2 ), 

(4.5) 

7r(T 2 \y,6,a 2 ) oc^^ + ^Q^- 1 / 2 

exp(-i{(y - X9)'(a 2 I n + r 2 Q n )- 1 (y - X9)})tt 1 (<j 2 , r 2 ). 

However, major simplifications arc possible with the Gaussian h A as in (i). In this 
case, the posterior analysis can proceed along the lines of [3], Specifically, assuming 
that h A {9) is proportional to the density of N(6q, t 2 T) with 

hK +i 
A M , 3 

where Mp = ^2j =a (2Kj + l) an£ l with t 2 A being the variance-covariance matrix of (i 
(which is also diagonal, having the diagonal entries specified by Var(/3jfc) = t 2 /2 2: > 8 ), 
we obtain 



(4.6) y\9,a 2 ,r 2 ~ N(X6,a 2 I n +r 2 Q n ), 

9\t 2 ~ N{9 ,T 2 T). 

Therefore, it follows that 

(4.7) y^V 2 ~ N(X9 a ,a 2 I n +T 2 (XTX' + Q n )), 

(4.8) 9\y,a 2 ,r 2 ~ N(9 Q + A(y - X0 ), B), 

where 

A = t 2 TX' {a 2 I n +T 2 {XTX' + Q n ))~\ 

B = r 2 r - t 4 TX' (a 2 I n + t 2 (XTX' + Q„)) _1 XT. 
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Now proceeding as in [3], we employ spectral decomposition to obtain XTX' + Q„ = 
HDH' , where D = diag(e?i, c^, . . . , d n ) is the matrix of eigenvalues and H is the 
orthogonal matrix of eigenvectors. Thus, 

a 2 I n + t 2 {XTX' + Q n ) = H (a 2 I n + t 2 D) H' = a 2 H (I n + uD) H' , 

where u = t 2 /a 2 . Then, the first stage (conditional) marginal density of y given 
a 2 and u can be written as 



™(y I ct 2 ,u) = 



f 



(4.9) 



(27n7 2 )™/2 det(/„ + uD) 1 / 2 

x exp |-^2 (y - Xe )'H(I n + uD^H'iy - X0 O )\ 

2<j 2 ^ 1 + udi I 



(27r(T 2 )n /2 nr=i(i+«di) i/2 



exp 



where s = {s\, . . . , s n )' = H'(y — X9o). We choose the prior on a 2 and u = t 2 /a 2 
qualitatively similar to that used in [ ] . Specifically, we take ir\ (a 2 , u) to be propor- 
tional to the product of an inverse gamma density {fc c_1 /r(c— 1)} exp(— k/ a 2 )(<j 2 )~ c 
for a 2 and the density of a F(b,a) distribution for u (for suitable choice of k, c, a 
and b). Conditions apply on a and b as indicated in [3]. 

Once 7Ti(<t 2 , u) is chosen as above, we obtain the posterior mean and covariance 
matrix of 6 as in the following result. 

Theorem 4.1. 

(4.10) E{0\y) =6 + TX'HE [(!„ + nD)" 1 | y] s, 



where the expectation is taken with respect to 
(4.11) 



^2i{u | y) oc 
and 
Var(0|y) = 

(4.12) 

where M(u) = 



,6/2 



(a + bu)( a + b )/ 2 



J(l + Udi) 



-1/2 



-(n+2c)/2 



2fc +E r 



udi 



1 



n + 2c 
1 



-E 



2fc + E T 



-TX'HE 



n + 2c 
+E [M{u)M(u)' | y] , 

TX'H(I n +uD)- 1 s. 



udi 

n 



udi 



(I n + uD)- 1 | y 



H'XT 



The proof of Theorem 4.1 readily follows upon using standard hierarchical 
Bayesian model techniques (see [8] , Section 9.1). The second part of Equation (4.10) 
can be viewed as a correction term added to the prior guess after observing the 
data y. 

Coming to the multivariate t form of Ha as in (ii) , we note that the multivariate 
t density is a continuous scale mixture of multivariate normal densities as, for 
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example, shown in [11], i.e., 9 has the multivariate t distribution with location 9$ 
and scale matrix V with density of the form (3.6) if and only if 

e\s 2 ~N(e 0) qs 2 v), (s'y'-xl 

This implies that with a tt as in (4.2), we obtain a hierarchical normal prior struc- 
ture for 9 with an additional hyper-parameter S 2 . Consequently, we can proceed 
with the posterior computations exactly as with the Gaussian Ha, except that we 
will have a two-dimensional integration after we simplify our calculations as above. 
However, MCMC techniques can easily handle this case. 

Computations with the Ha given in (iii) are more difficult. Analytical simplifica- 
tions as shown for the cases (i) and (ii) are not available here. Therefore, we utilize 
the MCMC computational scheme outlined in (4.3)-(4.5) above. Alternatively, the 
Metropolis-Hastings (M-H) algorithm may be employed. 



5. Model checking and Bayes factors 

An important and useful model checking problem in the present setup is checking 
the two models 

M : g = g versus Mi : g ^ g . 

Under Mi, (g = g{6), t 2 ,<t 2 ) is given the prior hA(0)^o(6, r 2 7 a 2 )I(g ^ go), whereas 
under Mo, 7To(<t 2 ) induced by tto(9, t 2 , a 2 ) is the only part needed. In order to 
conduct the model checking, we compute the Bayes factor, £?oi, of Mo relative to 
M x : 

m(y|M ) 



(5-1) Boi(y) 



m(y|Mi) ; 



where m(y|Mj) is the predictive (marginal) density of y under model Mj, i = 0, 1. 
We have 

m(y|M ) = J f(y\go,v 2 )M° 2 )dcT 2 

and 

m(y\M 1 ) = J f(y\e,T 2 ,a 2 )h A (9)TT (9,T 2 ,a 2 )d9dT 2 da 2 . 

Since improper priors can lead to difficulties in model checking problems, here we 
must employ proper priors. (Note that for estimation purposes, the noninformativc, 
improper prior (<r 2 ) _c corresponding to k — would have worked in the previous 
section.) We will develop the methodology here for the Gaussian membership func- 
tion only; the other cases are similar but computationally more intensive. 

Recall from the previous section that in the case of a Gaussian membership 
function h A , the posterior analysis is similar to that discussed in [3], and for the 
same reason, the computation of the Bayes factor is also similar to what was dis- 
cussed there. As in the previous section wo(9, r 2 , a 2 ) will be constant in 9, while 
a 2 is inverse gamma and is independent of v = a 2 /r 2 which is given the F a j, prior 
distribution. (Equivalently, u = 1/v = r 2 /a 2 is given the Ff,^ prior as before.) 
Specifically, 7r (cr 2 ) = {fc c_1 /r(c — 1)} exp(— k/a 2 )(a 2 )~ c , where c and k (small) 
are suitably chosen. Therefore, 

m(y|M ) = J f(y\go,<7 2 )M° 2 )d<T 2 = (2^)-" /2 f ^ T y 

, „ N -(n/2+c-l) 

T{n/2 + c-l)\k+-Y J ^-9o{x l )) 2 \ 
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Further, using (4.7), it follows that 



(5.2) m{y\M u a\u) = (2na 2 )- n / 2 + ud,)" V 2 exp -— £ _ i_ L , 

i=l I " i=l 1 J 

where s = (si, . . . , s„)' = i/'(y — X9q) as before. Therefore, 
m(y|Mi) = / m(y\Mi,a 2 ,u)tto(c 2 ,u) da 2 du 



h c-l r « 

= (2^-"/ 2 f ^- Ty yn(i+^)- i/2 -o(t 



i=i 



2)-(n/2+c) d(J 2 I ^ 



(5.3) = (2 7 r)-»/ 2 f ^-^r(n/2 + C -l) 



/H§^} n 



(1 + itdj) 1 ^ 2 7To(m) du, 



where ttq(u) denotes the Ff, ta density of u. Note that this involves only a straightfor- 
ward single-dimensional integration. The resulting Bayes factor is illustrated later 
in our examples. 

5.1. Prior robustness of Bayes factors 

Note that the most informative part of the prior density that we have used is 
contained in the membership function Ka- Since a membership function fiA(9) is to 
be treated only as a likelihood for 9, any constant multiple cIia{0) also contributes 
the same prior information about 9. Therefore, a study of the robustness of the 
Bayes factor that we obtained above with respect to a class of priors compatible 
with Ha is of interest. Here we consider a sensitivity study using the density ratio 
class defined as follows. Since the prior ir that we use has the form it(9,t 2 ,a 2 ) oc 
hA(0)no(9, r 2 ,a 2 ), we consider the class of priors 

C A = {it : c 1 h A {9)TT ^,r 2 ,<j 2 ) < an{9,T 2 7 a 2 ) < c 2 h A (9)Tro(9, r 2 , a 2 ), a > 0} , 

for specified < c\ < c 2 . We would like to investigate how the Bayes factor (5.1) 
behaves as the prior 7r varies in Ca- We note that for any it G Ca, the Bayes factor 
Bqi has the form 

I f(y\g Q ,o- 2 M9,T 2 ,o- 2 )d9dT 2 d<j 2 
01 / /(y|0, t 2 , a 2 )ir(9, t 2 ,o 2 ) d9 dr 2 da 2 ' 

Even though the integration in the numerator above need not involve 9 and r 2 , we 
do so to apply the following result (see [8], Theorem 3.9, or [4], Theorem 4.1). 
Consider the density-ratio class 

r du = {tt : L(rj) < an(rj) < U(rj) for some a > 0} , 

for specified non-negative functions L and U. Further, let q = q + + q~ be the usual 
decomposition of q into its positive and negative parts, i.e., q + (u) = max{q , (u),0} 
and q~ (u) = — max{— q(u), 0}. Then we have the following theorem (see [6]). 
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Theorem 5.1. For functions q\ and q 2 such that J | qi{rf) \U(ji) drj < oo, for i = 1,2, 
and with q 2 positive a.s. with respect to all ir G Ydr, 

inf 'L — 7-^ — — is the unique solution A of 
*er DR J q 2 {r])ir{ri) drj 



(5.4) ( qi (f 1 )-Xq 2 (r 1 ))-U(r,)dr 1 + (q 1 (r,)-Xq 2 (v)) + L(r 1 )dr, = 0, 



f qi{T))n{ri) drj . 
sup : t — r — r-r-r~ «s i/ie unique solution A o/ 



(5.5) J (q 1 ( V )-\q 2 ( v )) + U( V )d V + J (q l ( V )-\q 2 ( V ))-L(r ) )dr 1 = 0. 

We shall discuss this result for the Gaussian membership function only. Then, 
since the prior tt that we use has the form tt(8,t 2 , a 2 ) oc hj\(Q)TTo(T 2 , ct 2 ), and we 
don't intend to vary 7To(t 2 , er 2 ) in our analysis, we redefine Ca as 

C A = M0) : Cl h A {9) < ott(0) < c 2 h A (6),a > 0} , 

for specified < c\ < c 2 . Now, we re-express Bqi as 

J{Jf(y\g ,cT 2 )M<T 2 )d<r 2 }<0)M _ $ q^jO) d9 
J{Jf(y\e,T 2 : a 2 )TT (T 2 ,a 2 )dT 2 da 2 }Tr(9)d9 J q 2 (6)Tz{0) d6> 

where 



qi (9) = J f(y\g ,o- 2 )K (a 2 )do- 2 =m(y\M ), 
Q2(9)= J f(y\e,T 2 ,a 2 )no(r 2 ,a 2 )dT 2 da 2 . 



Then, Theorem 5.1 is readily applicable, and we obtain 
Theorem 5.2. 



inf Bqi(tt) is the unique solution A of 



(5.6) c 2 ( qi (9)-Xq 2 (9))-h A (9)d9 + Cl / (^(0) - \q 2 {6)) + h A {0) dO = 0, 



sup _Boi(tt) is £/ie unique solution A o/ 



(5.7) c 2 (q 1 (9)-\q 2 (9)) + h A (9)d9 + c 1 [ qi {9) - \q 2 {9))~ h A {9) d9 = Q. 
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6. Examples, simulations and illustrations 

Illustrative examples, simulated as well as those involving real- life data, are dis- 
cussed below. In all examples we have used two membership functions: 

1. Gaussian membership function: /^(s) proportional to the density of N(9q, t 2 T), 
where 9q is obtained from the wavelet decomposition of go. The hyper-parameters 
a and b (see (4.11) and (4.12)) are b = 3 and a = 8(b + 2)/(b - 2). Sensitivity 
analysis shows that the values of these hyper-parameters do not influence the 
results very much. To show that the other hyper-parameters c and k (see (4.11) 
and (4.12)) have some effect, though not substantial, we have displayed the 
wavelet smoothers (see Figures 1(a), 2(a) and 3(a)) for (c, k) = (2, 1.5), (1.5, 0.5) 
and (1.05,0.05). 

2. Uniform on the ellipsoid (see (iii) in Section 3): h,A{g) = I{pj(g,g )<8}- We have 
used three different values (0.5, 1 and 5) for S. 

For the simulated examples, we generated observations from the model (1.1) with 
the regression function g(x) = cos(27rx) where x is drawn from a uniform density 
on the unit interval. Then we considered three different prior guesses for go', (i) 
go(x) = cos(27rx) (see Figure 1), (ii) go(x) = 4\x — 0.5 1 — 1 (sec Figure 2), and (iii) 
go = (see Figure 3). 




0.0 0.2 0.4 0.6 0.8 1.0 



(b) Ellipsoid 

Fig 1. Wavelet smoother for g(x) = cos(27rx) and go (a;) = cos(27rx). 
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(b) Ellipsoid 

Fig 2. Wavelet smoother for g(x) = cos(2-7ra;) and go{x) = 4\x — 0.5| — 1. 



Note that in (i) the chosen go is the best possible prior guess. Further, since 
the normal prior is very informative, as expected the smoother (see Figure 1(a)) 
does an excellent job of extracting the true regression function. The behavior of 
the smoother obtained from the ellipsoid membership function (see Figure 1(b)) is 
similar to that seen in Figure 1 (a), even though the prior is different. 

The smoother presented in Figure 2 behaves very similar to what was seen in 
Figure 1. The prior guess, go, is slightly different from the true g. 

The behavior seen in Figure 3 emphasizes our comment following Figure 2. In 
fact, the smoother here looks better than the one in Figure 2. This is perhaps 
because the prior is less informative (concentrated) than the normal prior used 
there, so that the smoother can follow the data more closely than the prior. 

We next applied our wavelet smoother to the "Humidity data" example from 
(see [3]; also [8], Example 10.2). The variable of interest y that we have chosen 
from the data set is the weekly average humidity level. The observations were made 
from June 1, 1995 to December 13, 1998. We have chosen time (day of recording the 
observation) as the covariate x. Since we have 185 observations here, the maximum 
possible value for J is 6. 

In this data (see Figure 4(a)) a seasonality effect is present, so we have chosen 
go{x) = 22.5cos(2tt(.t + 0.1)/0.2) + 62.5, where a; = (day-June 1, 1995) /(December 
13, 1998 - June 1, 1995). This choice of go is rather arbitrary but it seems to follow 
the seasonal variations. For the analysis which led to Figure 4(a) the Gaussian 
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(b) Ellipsoid 

Fig 3. Wavelet smoother for g(x) = cos(27ra) and go = 0. 



membership function was used while in Figure 4(b), it was the ellipsoid one. In this 
latter figure, we have also added the lower and upper envelopes obtained from the 
prior (labeled Min/Max). From these two figures, it can be seen that the proposed 
estimator fits the data well and the final result does not depend much on the 
membership function used. 

6.1. Model checking 

The model checking approach based on Bayes factors developed in the previ- 
ous section has been tested on simulated examples. For this, we generated ob- 
servations form Equation (1.1) with the true function g(xi) = cos(2nxi) where 
Xi, i = 1,2,..., 20 were sampled from U(0, 1). For the error term, e^, we used 
a 2 = 0.1. Then, for illustration purposes, we considered three different go functions 
in (Mo : g = g ): 

(i) 9o(x) = cos(2?rx), 

(ii) g (x) = i\x - 0.5| - 1 and 

(iii) g = 0. 

Note that, the go function in (i) corresponds to the true function. The go functions 
in both (i) and (ii) are similar while the last one is very different from the true 
function g. Therefore, it is fair to assume that the Bayes factor (see equation (5.1)) 
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' q 1 

i \ • 
/* °'\ 2 



+ ++ °? + ^+°'* „ ° ; ^ \ £ i J 



(a) Gaussian 



(b) Ellipsoid 

Fig 4. Wavelet smoother for the "Humidity data" 



Table 1 

Bayes factor for M : g = g vs Mi : g ^ g 



9o 


•Boi(y) 


Evidence 


cos(2ttx) 


933.4275 


very strongly favors Mo 


A\x - 0.5| - 1 


57.4735 


strongly favors Mo 





7.2845 x 1CT 6 


very strongly favors Mi 



for the first two cases should not provide evidence against the model Mo : g = 
go while we can expect strong evidence in the case of the third function. These 
Bayes factors are given in Table 1. From this table, it can be seen that the model 
corresponding to the correct function (go(x) = cos(27rx)) obtains the largest Bayes 
factor followed by that for go{x) = 4\x — 0.5 —1. Moreover, if we test M : go(x) = 
against Mi : go(x) ^ 0, the Bayes factor favors Mi with strong evidence. 

7. Conclusions 



In this paper we suggest a simple approach to nonparametric regression by propos- 
ing an alternative to dealing with complicated analyses on function spaces. The 
proposed technique uses fuzzy sets to quantify the available prior information on a 
function space by starting with a "prior guess" baseline regression function go. First, 
wavelet decomposition is used to represent both the unknown regression function g 
as well as the prior guess go. Then the prior uncertainty of g relative to its distance 
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from go is specified in the form of a membership function which translates this 
distance into a measure of distance between the corresponding wavelet coefficients. 
Furthermore, a Bayesian test is proposed to check whether the baseline function g$ 
is compatible with the data or not. 

Acknowledgments. We thank the editors and two referees for providing critical 
comments which have brought significant improvements to our presentation. 
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